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Abstract 

A new algorithm, "HiER-leap" , is derived which improves on the computational properties 
of the ER-leap algorithm for exact accelerated simulation of stochastic chemical kinetics. Un- 
like ER-leap, HiER-leap utilizes a hierarchical or divide-and-conquer organization of reaction 
' channels into tightly coupled "blocks" and is thereby able to speed up systems with many reac- 

tion channels. Like ER-leap, HiER-leap is based on the use of upper and lower bounds on the 
reaction propensities to define a rejection sampling algorithm with inexpensive early rejection 
and acceptance steps. But in HiER-leap, large portions of intra-block sampling may be done 
in parallel. An accept/reject step is used to synchronize across blocks. This method scales 
, well when many reaction channels are present and has desirable asymptotic properties. The 

00 ' algorithm is exact, parallelizable and achieves a significant speedup over SSA and ER-leap on 

. certain problems. This algorithm offers a potentially important step towards efficient in silico 

' modeling of entire organisms. 
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1 Introduction 

Computational biology is moving toward ever more complex, comprehensive and detailed biological 
models. It is becoming increasingly important to simulate and understand these models compu- 
tationally. The Stochastic Simulation Algorithm[llJ (SSA) was introduced to exactly sample the 
Chemical Master Equation and has seen widespread adoption. 

The original SSA iteratively samples reaction events in a way that requires 0{R) computa- 
tional steps per sampled reaction event, where R is the number of reaction channels. This can be 
prohibitively slow when there are a large number of reaction channels or reaction events. 

This fact together with the importance of the SSA has inspired a slew of SSA acceleration 
techniques [IOlll[22l[l3l[l6l[3l[2l[Ml[25]. The work of Gillespie [12] and its recent variants [3 [3 [6] 
reduces the total number of reaction events that need to be sampled but does so at the cost 
of accuracy. Additionally, the work of Gibson and Bruck |10] reduces the amount of work per 
simulated reaction event to logi?. The work of Slepoy et al.|29] ups the ante further by finding the 
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next reaction event to sample in 0(1) time using rejection sampling under assumptions reasonable 
for biochemical networks. 

There have been recent advances in consumer level multi-core CPU technology. There are 
indications that next-generation CPU technology is moving from maximizing single-core speed to 
increasing the number of cores by orders of magnitude. There has been work on the parallelization 
of SSA via GPUs [2ll [TBI [H] and multicore CPUs [9]. However, the parallelization was used to 
speed up sampling of many trajectories rather than speeding up each trajectory in a large system. 
Multicore GPUs and CPUs have not been effectively used to speed up the sampling of a single 
Chemical Master Equation trajectory exactly. Arguably this becomes the dominant problem when 
extremely large systems are being studied. For example, the E. coli genome has been estimated to 
have about 4400 gene products|28j. This fact suggests that tens of thousands of molecular species 
will needed to be present if an E. coli specimen is to ever be comprehensively modeled in silico. 

Relatively little work has succeeded in reducing the number of reaction events sampled without 
introducing bias. While the work of Riedel and Bruck[26] is able to skip over cyclic states (eg loops), 
this method of reducing work does not apply to reaction networks with little state cycling. The 
previous work of the present authors (ER-leap) [25] is a leaping algorithm and was the first known 
general method to effectively reduce the number of SSA iterations sampled without sacrificing 
accuracy. This method scales well when reducing the number of SSA iterations. However, this 
method does not scale well when many reaction channels are present. 

The currently proposed work describes a new SSA-equivalent algorithm that can take advantage 
of parallel hardware, and additionally provides an algorithmic speedup for systems with many re- 
action channels. Like ER-leap, this "HiER-leap" (Hierarchical Exact Reaction-Leaping) algorithm 
achieves these advances without the loss of accuracy. The HiER-leap algorithm uses a divide-and- 
conquer strategy to independently sample sparsely connected submodules of the reaction network, 
in a way somewhat similar to ER-leap. HiER-leap then performs a network-wide synchronization 
using rejection sampling. As will be shown, this synchronization step is efficient for "reasonably" 
independent submodules. The acceptance probability associated with synchronization is asymptoti- 
cally equal to one as the number of reaction channels goes to infinity. This implies that the majority 
of the work will take place during submodule sampling, which may be performed in parallel. 

This work therefore presents a potentially important step towards organism-scale simulation. 

2 Background Theory 

The new HiER-leap algorithm begins its derivation from the state transition distribution defined 
by the Chemical Master Equation after L > reaction events. We then algebraically manipulate 
the CME until a distribution suitable for parallel sampling and synchronization is found. 

In many ways this derivation closely follows the derivation found in ER-leap. Therefore, this 
section is dedicated to recalling the notation and key equations from ER-leap |25j that will serve 
as a starting point for the algorithm derivation in section [3l 

2.1 Notation 

We define reaction channels, indexed by r, as a set of input and output species, Ca, with corre- 
sponding input (m^'""-') and output {m'^^°'^) stoichiometries 




with reaction rate pr 
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and the net stoichiometry for a given species and reaction channel as 



Later, we will show the probabilities of state transitions after L "reaction events" occur. 

Under the Chemical Master Equation it is assumed that each reaction channel has a small 
probability of firing during a small time interval dt with probability equal to ar{n)dt. The vector 
n, possibly indexed by a for species type a, represents the quantities of the constituent species in 
terms of raw counts. This 0^(11) term is also called the propensity or rate of reaction channel r and 
is defined as 



ar{n) = prF^ , 



R 



where 



n 




ao{n) = '^ar{n) (2) 

r=l 



■f ^ W ^ 

it m'a 



(na-mi')! |_ (3) 

otherwise 



Note that superscripts involving "r" and related variables that index reaction numbers occur 
here and numerous times in the following. These are enclosed in parentheses "(r)" throughout, to 
indicate they are not powers but rather indexes. 

In this work, it will be notationally convenient for us to keep the propensity term factored out 

(r) 

into pr and ■ 

(r) 

As a brief aside, the upcoming derivations in this paper may work with other forms for Fn . 
For example, the "umbral transformation" of a Hill function, 

Fi''^ = Umbral[Hill{n; K)] 
~ (K« + n(fc))' 

may work as propensity function, where the falling factorial n^^) = ^^"'^.^i replaces n'^ in Hill{n; K) 
(or more generally in a rational function) for each power of any integer-valued molecule number n. 
This functional form has the advantage of being monotonic and equal to zero for n < k, as required 
for a stochastic version of the Hill function with discrete integer numbers of molecules. 
Furthermore, we define the total propensity Dj for some reaction to occur in state / as 

r 

which is equivalent to equation ([2]). 

Bounds for F^^^ and D, after L reaction events, are computed by bounding species counts after 
L reaction events. For each species identifier (ID) a, we bound the number of molecules present 
after these reaction events, n^, by: 

na + L min^ |Am^'"''| ^ n'^ ^ ria + L max^ |Am^^^| . (5) 
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If we introduce the notation that a tilde superscript or subscript, x or x, represents upper or lower 
bounding values respectively then, we can re-write the above as: 

na<n'^< n. 

The corresponding propensities calculated from using the upper and lower bounding states, after 
L — 1 reaction events, respectively are written as 



and therefore 



bK,L-l = Y.PrFi]L-l 

r 



for any state K. 



2.2 Markov Process 



In the ER-leap paper [25] it was shown that the probability of starting at state Iq and ending up in 
state II after r time elapses is 



PljL.rUa.L) 



E 

{i?fc|/c=l..L-l} 



fc=L-l\0 



(6) 



for every ordered vector {Rk\k = 1 . . . L — 1} of reaction channel events. Each state may be uniquely 
transformed by a reaction as Iq Ii{R,Io). 

Furthermore, it was shown in ER-leap \23] that for any function e(r) summing over all possible 
orderings of L reaction events is equivalent to summing over all possible counts of reactions (eg 
a multinomial with L draws) and then permuting each of these draws for all unequal reactions, 
yielding 

{'''k\k=l--L—l} {s|sr£N,J^j, Sr=I/} -fcrlo" permutes unequal r's|s} 

which, when combined with equation ([6]), and introducing the previously defined bounds, separating 
out terms in e(r) which are permutation invariant, and after some algebra results in 



P{Il,t\Io,L)= E 

{s|SreN,E^Sr = i} 



L 

Si ... SR 



R ( ^pir) 



n 

r=l 




DloL- 



^^Pi-MDl^(a{r),Io)JkHr)Jo) - ^ , )) 



(8) 



{a\s} 
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This expression can be interpreted as a rejection-sampling algorithm (last line) that corrects a 
multinomial approximate sampling algorithm (first two lines). 



2.2.1 Rejection Sampling 

Through the lens of rejection sampling, equation ([8]) represents an algorithm. 

Briefly, rejection sampling is a method to sample x from some distribution, x ~ P{x)^ by 
means of an approximate distribution P'{x). This can be expressed algebraically since P{x) can 
be rewritten as 

= ^'^""^W^) + " ^/^^ ^^"^^ 

assuming M > 1. Equation ([9]) can be viewed as a mixture distribution with the probability of 
sampling from P'{x) being the "acceptance" A{x): 

for some constant M such that \lxA{x) < 1. 

It is now possible to see the ER-leap algorithm represented in equation ([8|). If in equation ([8]) 

we recognize P{x) = P'{x)MA{x) (equivalent to equation Q), with M = (^Dj^l-i^ / {BiqL-i)^ , 

then we will implicitly define a P'{x). This P'{x) has the next L reaction events sampled from a 
multinomial with the probability Pr of choosing the r*'^ reaction channel being equal to 

/ 



Pr 



Furthermore, our P'{x) samples r from an Erlang distribution (equivalent to a Gamma distribution 
with integer "shape" parameter) with rate parameter being Di^^l and shape parameter being 
L. Finally, if needed when calculating A{x), a random permutation a is drawn uniformly and 
{''"All''" = Z]fc=o '^k} is sampled from an L-simplex. 

The work in section [3] will similarly arrive at an equation representing an efficient and exact 
leaping algorithm for sampling L reaction events from an SSA equivalent distribution. 



3 Theory 

3.1 Hierarchical Notation 

The HiER-leap algorithm uses a divide-and-conquer strategy to accelerate SSA. Evidence suggests 
that protein-protein interaction (PPI) networks tend to be modular |24j . These networks contain 
submodule clusters that interact heavily inside the cluster. Interactions with other clusters of 
proteins are less common. Although still an active area of research, evidence |14j suggests that 
similar modularity may exist in genetic regulatory networks as well. Additionally, when modeling 
spatial interactions [231 113 ED IISl [13 [2Q] , events spatially distant must interact through sparse 
intermediate diffusion reaction channels. In this way, it is probably common that many reaction 
channels are weakly coupled to the majority of other channels. This observation suggests a potential 
avenue towards algorithm acceleration and parallelization for large biological networks. 

Notation is introduced below to describe a hierarchical organization of reaction channels. Table 
([T]) provides a comprehensive guide to notation used throughout the following sections. Next, 
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following and generalizing the strategy of section [21 we will derive bounds on propensities and 
species. The bounds will be essential for deriving an algorithm for exact speedup of SSA for 
systems amenable to hierarchical organization. 

Reaction channels must belong to exactly one block. A block is defined as a set of reaction 
channels. If reactions are "connected" by shared reactants, it is preferred that reactions should 
be more strongly connected within than between blocks. For this work, a two level hierarchy of 
reactions and blocks is used. However, it is straightforward to apply this method repeatedly to 
multiple levels. 

Each reaction channel is indexed by its block ID ri, and its within-block ID r2, and will be 
designated as i? = (rir2) for ri € {1 ... 6} and r2 € {1 . . . 6,.^}. The "block propensity" for block ri 
and state I, denoted Dj is the sum of propensities of constituent reaction channels. Specifically, 
similar to equation this means 

D^r^ = Y: Pr.rA'''\ (11) 

Furthermore, we denote the number of reaction events occuring within block ri as ti^ . Finally, 
the number of events for the reaction channel indexed by i? = (rir2) is denoted by Vr-^r2- 

3.2 Bounds on Propensities and Species Counts 

Similar to equation ([5]), we now develop bounds on species counts and propensities. This enables 
us to derive a two-scale rejection sampling algorithm in many ways analogous to ER-leap at each 
scale. For reasons that will become evident in section 13.31 we first derive bounds on the block 
propensities given L and /q. Afterwards, bounds will be developed on the species molecule counts 
and reaction channel propensities given u. 

First, recall that in equation ([5]) we found bounds on species and propensities after L reaction 
events. Note that similar to equation ([5]) we can define 



f)(r-i) _ p{nr2) 
^K,L-l — Prir2^ K,L-l 

r2Gri 



and a similar definition for L~v 



3.2.1 Optimized Block Level Bounds 

If it is the case that we only need bounds on the block propensities, and not individual reaction 
channels, then we can take advantage of "reaction event exclusion". This means that we only need 
to consider the sequence of at most length L reaction events which will result in the most extreme 
value for the sum of propensities in block ri. Therefore, we no longer need to assume that all 
species counts are at the most extreme value possible after L reaction events. 
We want to find a bound closer to the optimal block propensity 

D^/j']* = max y Pr,rM]Y^^ V (12) 

111 1 T2&r\ 

Unfortunately, naively solving this exactly for ri requires enumerating {br^)^ possible choices for 
upon every iteration. Fortunately the bound we seek, D^jjj^y is not required to be exactly 
optimal. Instead we only require that 

5^' ; < 6<;;;> , < o|;;;> , (is) 
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Symbol Meaning 

Upper bounding value for x after L — 1 reaction events. 

Calculated by assuming each species type will be maximal. 

Upper bounding value for x after L — 1 reaction events. 
X May not depend on bounding all species values and 

therefore may be tighter than x. 

Upper bounding value given u. 
X Will often involve inner block calculations. 
Z,Si,x Lower bounding versions of the above definitions. 
X* The optimal value of x with respect to some objective function. 



Table 1: Notation: Accents and Meaning 



such that D)j L < D)j i-, is required for algorithmic correctness and D), < D), is needed 
for improved efficiency. 



)(n) 

values and has 'nice' asymptotic properties that will be discussed later. 



A heuristic algorithm for D^^^j^Jj^^ is developed. We demonstrate this falls between the requisite 



Derivation The idea is to find the maximum AD^jJ^-^ possible resulting from one reaction channel 
firing sometime during the next L reaction events, if we determine this value, we can upper bound 
Z)|?^ r , with 



where 



<%<<^^+(^-i)A5[;:i)* (14) 

-^'■l''2ll-fo---fL-l 



Note how this is an upper bound on D^jJ^y By construction, AD^j^^^^ is the largest amount that 
the block propensity may change for any of the upcoming possible (L — 1) reaction events in ri. 
Since there are (L — 1) reaction events, and the most any of them may increase D^^^j^J^ is AL)|^^]^^ , 

equation dH]) will always bound D^^^^J^y 

This method improves upon our previous methods, which found the maximum ha for all species 
and then calculates the block propensity. Each within-block reaction channel propensity will be 
larger when ha rather than Ua is used to calculate the propensity. Therefore, using the increased 
bound will result in a block's propensity being 0(6r-i *L,) larger than D^fJ^ ■ However, by calculating 



using equation (|14|) the bound will be just 0{L) larger than Dj 

)iri) 
(loL) 



(n) 



Again, naively solving for AD'fj^X. requires an impractical amount of work. But as with our 



previous argument, we can upper-bound AD^^^^^-^ and still achieve an upper bound for D^jjj^y To 
upper bound Ad|^^|^^ we use the monotonic nature of Dj'"^\ If any species increases to n'^ > Ua 

(ri) (ri ) 

we know that D , > Dn^ . Therefore, if we find the reaction channel that increases the block 

a 

propensity the most when hi^^L is used for positive Ami^^'^^\ we are guaranteed that there does 

^ (ri ) 

not exist a larger AD), i-,. 
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This yields 




max D^'''\ci{h,r2)) - D^'''\ii) 



(15) 



where 



Ua + Am, 



(rir2) 



if Am^^^) 



> 



q(n,r2)a 



■a 



(16) 



otherwise 



as our final equation for AD^j^ j^y A proof that this bounds the maximum delta possible can be 
found in the appendix. 

This tighter bound will result in a greater acceptance ratio. The basic reason for this improve- 
ment is that we need not overestimate every propensity in ri by 0{L), and add the overestimates 
up, since only L and not biL reactions will occur. 

Naively finding the reaction channel R = {rir2) that will increase D^'^^\niQ^L) by a maximal 
amount will cost 0{\Rr-^\) steps to compute. To accelerate this step further, blockwise priority 
queues (PQ) are used to find this R efficiently. Nodes in the PQ are reaction channels and values 

(ri) 

are the AD^j^i ^ ^ caused by each reaction channel firing. Upon acceptance of L reaction events 
we must update the priority queue for each block. Only nodes that interact with species which 
have changed, need to be adjusted. This, at worst, will be 0{logbr^) work for each node, although 
in practice the order rarely needs to change. 

3.2.2 Propensity Bounds Given u 

If we know n, the number of reaction events for ri and adjacent blocks, we can derive even tighter 
bounds on the reaction channels Rr^*. In fact, these tighter bounds help us to efficiently increase 
L when larger systems are considered, as will be demonstrated in section [3^ 

We determine F^^^*) by finding bounds on species counts given u. In other words, we want to 



which is the maximum possible value of prior to the last event in ri occurring. In this way 
n-Air^nQ, k) will never exceed the propensity calculated from nAri{u,no). 

Finding the optimal value for nAri{u,nQ) is straightforward. We first need to consider blocks 
other than ri which may change UAn- Since the order of reactions is unknown, we must assume 
that all reaction events in blocks except ri, written as u\{urj^}, occur prior to those in u^^. It is 
desired that the number of neighbors relative to the total number of blocks will be small. This 
will decrease nA{r,nQ,k) and ultimately lead to a more efficient algorithm. Secondly, we need to 
consider reactions in ri. In calculating the bound, it is assumed that all (un — 1) reaction events 
chosen will behave adversarially. This is analogous to the method considered in section 13.21 with 
the modification that we will consider a subset of reaction channels. Thus 



will bound each riAn with respect to ri and u. The Kronecker delta function 6{a, b) or 6{a — b) is 
as usual: 



find 



"•a(?'5 "-0; ^) ^ f^Ari ("U, no), for k = 0.. [(index of final ri event) — 1] 





Finally, the propensities of reaction channels inside of block ri are bound as 



F(^i^2)(n,no) = F(^^^^)(n,nAn). 
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As in ER-leap, lower-bounding the propensities and species is done with the same techniques 
as that used for upper-bounding with the restriction that propensities and species molecule counts 
cannot go below zero. These derived bounds are used in the following sections. 

3.3 Equivalent Markov Process 

Similar to section 12.21 we want to algebraically manipulate the distribution represented by the 
Chemical Master Equation (a special case of the Kolmogorov-Chapman equation [30]) into a form 
suitable for parallelization and acceleration. The hierarchical description from section 13.11 will aid 
us in this transformation. 

First, note it is possible to rewrite equation ([7]) into a hierarchical version with it's and f 's 
strictly ordered such that 

E <^)= E E 

^ ^ e(ai(a2(R))). 

{cri\cri permutes unequal i?/s|u} jo-j |ct2 permutes unequal R'slvr;^} 

By taking an average of e(o"(R)) and weighting by the number of ways the selection may occur we 
get 



and analogous to the way shuffling a deck of cards is the same as shuffling by suit and then, 
maintaining that order, shuffling by value independently for each suit, we may write 



L \ LI 



RlR2---RnJ R1IR2I ■ ■ ■ Rn^- 

LI 



n 



and arrive at a useful form for our distribution, which is already suggestive of a block-parallel 
algorithm: 

To go further, we need to re-examine e(. . .). 
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3.3.1 Introduction of Probability Bounds 

We now make use of our previously derived propensity bounds to derive a parallel algorithm. From 
equation ([6]) we have 



[{e{a,ia,m))J„^ = ( ( 11 Z'^^^Wo) ^M-rkiDj,^n,io)Mn,io))) 

\ \fe=L-l\0 



with inclusion of derived bounds, 



If we separate out terms based on independence of ai, 02 and u, then 



n 



X 



\k=L-l\0 ^ ' 



We now substitute the expression for (^{e{ai{a2(R}))) into equation (fT7|) . combining terms 
where appropriate: 



n W expi-TkD) 

k=l-l\0 



D 



{Io,L) 



{u\uneN,Y,iiUji=L} 



L 



1Xr--\ ... ULy,! 



D 



in) 



E 



n 



Vr-ir2 ■ ■ ■ '^nr! 



n 



PR 



(loL) , 



2^ r2 



l>('^i)(m,Io) y 

(-^(/o,i^)) exp(-TZ)(/Q2.)) X AcceptCoarse{u;Io,L)x 

iY\_^cceptBlock{vri,(72;u)\ x AcceptFine{ai;u,v,lQ,a2) (18) 
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The acceptance probabilities are as follows: 



AcceptCoarseiu; Iq, L) = J] ( ^^'^(^^"/"^ ) (19) 

p{Rk) 

AcceptBlockivr, , ^2; n) = J] p(JS{'^\ ^ (^0) 



AcceptFine{ai;u,v,Io,a2) = exp(-rfc(D4(R_/„)_4(Rjg) - z5(/pL))) (21) 

fc=L-l\0 

Furthermore, prior to turning these equations into an algorithm, we note that we can lower- 
bound these acceptance probabilities. This will enable us to do an early acceptance or rejection 
without always doing all of the work to calculate these values exactly. 



3.3.2 Lower Bounding Acceptance Probabilities 

We begin by lower-bounding AcceptFine{. . .). This probability requires the most work to calculate 
and as we will see may be bound fairly tightly. The bound only requires that r has been sampled. 
The lower bound AcceptFine{. . .) is sought such that 

AcceptFine{. . .) < JJ exp(-rfc(L»4(R,/o),4(R,,/o) " U{hL)))- 

fc=L-l\0 

for all possible {t/j} and {Iq . . . In the above, note that D^^i^i^-^ is constant with respect to k. 

Therefore, when we also upper bound 

Di^L > ^/fc(R,/o),/fc(R,/o)' 

this creates an easily computable expression for the lower bound 

AcceptFine{ai;u, v, Iq, c72, r) > exp{-Tk{Di^L - 5(/oL))) 

fc=L-l\0 

so that 

AcceptFine{T; Iq, L) = exp{-T{DjgL - D^hL))) (22) 

Furthermore, recall that E[t] = LjD^i^^^V)- If we assume that AZ)(7-g2.) ^ when computing D^^q 
and D^j^^ (see equation (jl4p ) in the limit of many non-zero propensity reaction channels 

( lim exp(-r(57oL - 5(/oL))) ) ^ 1 

which implies that both AcceptFine{. . .) and AcceptFine{. . .) tend to unity as the number of 

reaction channels increases. 

Next, we set out to lower-bound AcceptBlock{. . .). This acceptance probability depends on 02- 
Therefore, work will be saved if we can calculate the lower bound without sampling a2- This can be 
accomplished by noting that AcceptBlock{. . .) is a product of fractions. If we have a numerator and 
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denominator that are independent of 02 we can re- write this equation in terms of r2. Specifically, 
using F^^[^^ allows us to lower-bound the equation. 

AcceptBlock{vr,,a2;u,n) = {[ > || 



yielding 



3.4 Algorithm 



AcceptBlockivr, Jir^J^iJ) (23) 



The above equations, along with rejection sampling, allow us to create an efficient algorithm that 
will allow much of the work be done in parallel. From equation (|18p observe there are two probability 
mass function expressions for a multinomial distribution. Specifically, 



Multinomial {u; < 



is the multinomial distribution for sampling u. And for each ri the vector is sampled as 



Multinomial{Vr^; \ p^ra = — r^urr, — T\ — r '^n. 



n 



'pR,F('-i'-2)(n,/o)^ 



U^.ra • • • Vrxr'^J \ D('"^){u, Iq) J 

which is interesting and implies is independent of all other block's v^i_^ given u. 

The multinomials may be sampled independently for each block, however it may be the case 
that equation (pO|) needs to be computed by considering multiple blocks simultaneously. Specifi- 
cally, computing /^(R, /q) for block ri may require knowledge about any neighboring blocks, r[, 
changing the chemical species counts for reaction channels Vr^ reacting on the same species. Since 
equation (p3]) is independent of . .), this "joint" acceptance probability only needs to be calcu- 
lated when (a) blocks ri and r[ share a chemically reacting species in their respective Vn and Vj.'^ 
sampled reaction channels and (b) block ri or r[ does not pass early block-acceptance. Computing 
equation (j20p jointly involves computing a a2 for reaction events in ri and r[. Then, . .) may be 
computed properly. It should be noted that in the pseudocode below some possible optimizations 
(eg independent early block accept and some parallel execution) are not shown for clarity. Instead 
"connected components", block groups which may need to be sampled jointly if conditions (a) and 
(b) are met, are sampled entirely in "joint" form. 

We now present the HiER-leap algorithm, which is a realization of the aforementioned equa- 
tions, in pseudocode. First, note that if we have an early global acceptance, then most of the 
computational effort will be put into line 12 of the following pseudocode. The subroutine from this 
line will be shown later. Notice that this function is independent for all blocks, with the exception 
that computing equation (f20l) may need to be done jointly for neighboring blocks, and needs to be 
done for all blocks with at least one reaction event. This is an ideal scheme for parallelization and 
is done so with good efficacy as will be shown. Furthermore, for the tests in in section 14.21 the full 
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calculation AcceptFine{. . .) was rare because in general: AcceptFine{. ■ ■) > 0.995. The algorithm 
is as follows. 

Require: D^/^^), Z)(/jj^^), {D^^j^^^j^^} precomputed for L > 1 and Jo- 
Ensure: Return (//,, At) > return updated state and duration of L steps 
function HiER-Leap(/o, L) 

T ^ ERLANG(T;5(/„_i),L) 

U ^MULTINOMIAL(ti; < Pr-. = ' , > , L) 

Compute -D's t> May be done in parallel. 

5: In accordance with equation ()19p . 

if UniformRandom(0,1) >AcCEPTCOARSE(ii; Iq, -Z^) then 

return HiER-Leap(/o, L) > Early Rejection. Try again, 

end if 



10: for all c G ConnectedComponents(u, i?, /q) do 

{vri,o'2) SampleConnectedComponents(c, n, Jo) 



i> See algorithm below. 
> May be done in parallel. 



end for 

15: Z -ir- UnIFORMRANDOM(0,1) 

if z < AcceptFine{T; Iq, L) then > See equation (f22]l . 

return {Ii{Iq,v,u),t) > Early Acceptance, 

end if 

> Computation should be rare; see equation (l2T]l . 
20: if z < AcceptFine((7i; ti, V, /o, (72, r) then 
return (/^(/o, u), r) 
else 

return HiER-Leap (Jo, > Try again, 
end if 
25: end function 

Ensure: Return each connected component of blocks, where two blocks are in the same connected 
component if they share a reaction species and each blocks has at least one reaction event such 
that ^ 1- 

function CONNECTEDCOMPONENTS(n, R, Iq) 

C {} i> C is a set. 

B R i> is the set of blocks. 

30: while B ^ do 

h <— B.ChooseElement() 
if life > 1 then 

c ^ DepthFirstSearch(6, n, B) 

\> Blocks share an edge iff for each Ur^ > 1 and they share a species. 
35: B ^ B\c > Set operation subtraction. 

C ^ C.Append(c) 
end if 
end while 
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return C 
40: end function 

The pseudocode to sample each connected component is as follows. 
Require: The connected component c contains blocks which all have at least one reaction event, 
function SampleConnectedComponents(c, u, Jq) 
pEarly ^ 1 
for all ri G c do 



-Multinomial {vr-^ ; < 



1^2 



D(^i1{u,Io) 



for r2 € ri and ?^(rir2) — ^ 



pEarly ^ pEarly x 
end for 
end for 

z ^ UniformRandom(0,1) 
if z < pBlockEarly then 

return (v, o"2) 
end if 

pComponent ^ 1 

CT2 ^ Permutation J 

for all = 1 . . . |v| do 

r'2 ^ 0-2(v,/c) 



F(''1''2)(m,Jo) 
F(''1''2)(u,/o) 



> Early Accept. 
i> Must compute exact acceptance probability. 



pComponent ^ pComponent x ^^^^^/"j'^ 
end for 

if z < pBlock then 

return (v, 02) 
else 

return SampleConnectedComponents(c, u, Iq) 
end if 
end function 



Calculating takes the most work. 

Accept sample. 
> Sample rejected, try again. 



4 Numerical Experiments 
4.1 CaliBayes Validation 

We check the HiER-leap algorithm correctness numerically with the CaliBayes test suite similar to 
the work in ER-leap [25j. If is possible to solve analytically for P(X\t), this allows us to compare 
many simulated trajectories to the true distribution defined by the CME. Since HiER-leap reduces 
to ER-leap when the number of blocks goes to one, and it has already been show that ER-leap 
samples the correct distribution, we wish to test across a variety of reaction channel quantities and 
organization structure. 

The reaction networks in CaliBayes for which we know the analytical solution involve at most 
two species types. However, simulating many replicates of these networks on a grid, not connected 
with diffusion, will allow us to treat each block as an independent sample. We can then treat the 
simulation of many network replicates as many sampled trajectories of a single network. 

We perform tests over a number of network replicates m = 2 . . . 1000. The number of blocks 
range from b = 1 . . . m. The leap is in the range L = 3 ... 18, where the leap used depends on the 
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specific reaction network, m and b, but is held constant throughout the simulation. 

CaliBayes models 1-01, 1-03, 1-04, 2-01, 2-02, 3-01 and 3-02 [8] are tested, on the spaced defined 
by the Cartesian product of the possible values for the m, b and L parameters as described above, 
for parameters which result in an acceptance probability greater than about 0.05. These tests pass 
on these cases using the criteria of Evans et al. [8] 

We now turn to a large, spatially coupled system. 

4.2 Acceleration 

As an exact algorithm, the key performance metric of relevance to HiER-leap is the amount of 
acceleration achievable. As discussed earlier, in principle adding more reaction channels and pro- 
cessors should increase the relative speedup over SSA. We can see this trend experimentally in 
figure [T] and figure [21 

Reaction Channels on 2D Grid vs. CPU Time 

10= 
10* 
10= 
10' 

0) 

E 10' 
i- 

10° 

10"' 
10-' 



10' 10' 10= lo" 

Network Replicates 

Figure 1: The Williamowski-Rossler model as seen in section [4?2] is used for this experiment. There 
are different number of network replicates on a 2D square grid with diffusion rate of 0.1. The 
number of replicates ranges from 4 to 8649 which equates to 64 to 189612 reaction channels. 
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Reaction Channels on 1 D Grid vs. CPU Time 




10' 10' 10' 10" 

Network Replicates 



Figure 2: The same experimental setup as used for figure [T] except ID diffusion is used. 
We test using a spatially coupled version of the Williamowski-Rossler model [32] defined as 

k2 kfi kio 

X + Y^2Y X + 

k4 ks 

replicated over a d-dimensional grid for d = 1 or d = 2. Diffusion reaction channels for all species 
are added between adjacent grid cells with a rate of = 0.1. Parameters and initial conditions 
for each of the replicated Williamowski-Rossler grid cells are as follows: ki = 900, k2 = 8.3 x 10~^, 
k3 = 0.00166, k4 = 3.32 x 10-^ k^ = 100, fcg = 18.06, kj = 0.00166, kg = 18.06, kg = 198, 
kio = 0.00166. X{0) = 39570. Y{0) = 511470. Z{0) = 0. 

The following tests are all run on an Apple Macintosh Pro with a Quad-Core Intel Xeon processes 
running a total of 8 cores at 2.26 GHz and 13 GB of RAM using OS X 10.6.8. The algorithms 
are coded in C+-|- and Boost.Thread [18] and the Intel Threading Building Blocks [Ij are used 
for multithreading. Connected components were found using the depth-first search algorithm. 
We compiled the code using the LLVM compiler 1.0.2. The HiER-leap code may be found at 
^http:/ / computableplant. ic s. uci. edu/hierleap/ . 

Results are shown in figures [1] and [2j They show a substantial speedup of HiER-leap over SS A 
and ER-leap, around lOOx and lOx respectively, as we increase the number of reaction channels 
to around 190,000 . The spatial nature of this experiment means that blocks are neighbors with 
relatively few other blocks. This leads to a greater "coarse-scale" acceptance probability and 
therefore increased efficiency. 

Additionally, we see that the slopes of the log-log runtime plots for SSA and ER-leap become 
nearly equal as the number of reaction channels increase. This is expected, since ER-leap finds 
bounds on individual reaction channels after L reaction events, and this bound is independent of 
the number of reaction channels. HiER-leap however does not have this shortcoming and has a 
lower slope (eg better asymptotic behavior) as a result. 

4.3 HiER-leap Properties 

The algorithm parameters, such as leap size and hierarchical organization, require optimization 
before the fastest possible execution time is achieved. To find the ideal methods with which to 
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optimize our algorithm, we explore various trade-offs here. 

In figure [3] we observe that the optimal b and L are interdependent for a given network. How- 
ever, it is interesting to note that for this experiment there is a relatively large plateau of nearly 
equivalent optimal running times. This means that the range of reasonably good parameters is 
large. Furthermore, the contour plot of figure [3] indicates that there is only one global optimum. 
This seemingly convex behavior indicates that finding the optimum requires only a simple hill 
climbing algorithm. 



Contour Plot of Log Time 




6 10 15 20 25 30 35 

Network replicates/block 



Figure 3: Log CPU Time vs Leap and Hierarchical Structure. The Williamowski-Rossler model 
as seen in section 14.21 is used for this experiment. There are 400 network replicates on a ID grid 
with diffusion rate of 0.1. The model execution time depends on leap and hierarchical organization. 
As leap increases the amount of work per iteration goes up but the acceptance ratio goes down. 
Furthermore, if there are many reaction channels per block the total acceptance probability of 
the system goes down. However, in this situation the inner-block acceptance probability goes up. 
When the number of reaction channels per block goes down, the opposite trends occur. In this way 
the chosen leap and block organization will determine the total execution time. 

Thus, the results from figure [3] indicate that finding the optimal L and hierarchical organization 
for a spatially distributed system is an easy optimization problem. These results, and those from 
ER-leap, suggest that L will generally have a local optimum that is also a global optimum. How- 
ever, the optimal configuration of the blocks and reaction channels for networks not specifically 
representing a spatially distributed reaction network remains an open problem. 

5 Summary 

We have presented a novel accelerated stochastic simulation algorithm which has demonstrated an 
ability to sample from the CME without a loss of accuracy. Due to its hierarchical design, this 
method (a) scales very well with the number of reaction channels and simultaneously (b) takes 
advantage of parallel hardware for single trajectory samples. As far as we are aware, this is the first 
exact accelerated algorithm with either property (a) or (b), and is therefore of potential significance 
to the computational biology community. 

Open questions and future work abound. For example, it is not know how well this method 
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works on 'real networks' of substantial complexity taken from biological modeling practice. We 
believe that modular structure in biological networks will make the method particularly useful. 
Additionally, it is unknown how substantial increases in the parallel architectures of future com- 
puters will increase performance. 
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Appendix 

We will show that for AZ)|^^^^^ from equation (jl4p and AZ)|j^]^^ from equations (jlSp and (jl6p . it 
is the case that /:\D'fj'\* < AdI','-],. 

Proof by contradiction. Assume there is some r2 S ri and state /' = n' with Van^ < ha and 
3(jn^ < ha, reachable from Jq in at most L — 1 reaction events used to find AZ)|j^|^^ such that 

AD^j^J^^ > AD^j^J^y Substituting in our definitions for AD^j^^^ and AD^j^^^ , using equation [H 
and introducing the notation that I{r2) will be the result of r2 applied to I and I{r2~^) is the result 
of r2 applied to / only for species which have net gain (Am^"^^ > 0), yields 

a5(7J/ = d(;^\-z)(?) 

(loL) I'{r2) I' 



and 

Therefore, we can equivalently say that we are trying to disprove 

P{rir-) 

r''eri 



E Pinr'^) (^^'■^'^'^(n + Am(^^^2)) - F^^^^'^\h)) . (24) 



r''eri 



Note that by grouping terms by r!^, there is a one-to-one correspondence between the summation 
terms on each side of the inequality. 

If true, equation (|24p implies that there is at least one reaction channel r2 G ri for Am^^^'^^ > 
such that 

i7(-i-2)(n' + Am("i"2)) - F('^i"2)(n') > F^'^''~'^\h + Am^'''''^^) - F^'^''^\iv). (25) 
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But we will show that this is impossible for any ha > n'^> 0. Note that we do not need to consider 
Ami'^^'^^^ < because F is monotonic, the LHS will be decreased and the RHS will not change as 



per the definition of IS.D^^j^^j^ (negative Ami'"^'^^^ are ignored). 



Before proceeding we will introduce the forward difference operator, A^(j), such that 

^F{r)f{z)^f{z + i)-f{z) (26) 

for any function f{z). 

Furthermore, ^'^^'^''^•'(n) can be decomposed by species into terms including chemical species Ca 
and those which do not. Following from equation ([3]), this allows us to rewrite F^'^'^^'^\n) as 



F('^^'-^)(n)=G('-i'^^)(„\{n4)x(nJfc 

for some constant G*^^^^^)(n\{na}) > which does not depend on n^, where k 
stoichiometry for reaction r'2 and species C^, and 



m, 



is the input 



n! 



[n)k 



{n-k)\' 

For equation ()25p to be true there must exist a species Ca such that 



(27) 



is true. All of the above F^^^''^) are calculated using rii, = n'\{na} and G {n'^^ha}. When we 
show that n'a will not result in a greater delta than that offered by using ha instead, this implies 
that equation ([25]) may never be true. 

Equivalent to equation ([27]) . by dividing out ^'^''^''^^(n') > 0, using equation (|26|) . and setting 
m = Arria we arrive at 

^F{m){ha)k - AF(rn){n'a)k < 0. (28) 

However, because n'a < ha, if it is shown that Ap(^)(n)fc is monotonic in n then this will imply 
equation [28] is false. 

Therefore, it just remains to be shown that Ap^^-j{n)k is monotonic in n. Consider the following 
equation which tests for monotonicity 

^F{m)(.n + l)k - ^F{m){n)k 

= [AF(i){n + m)k + . . . + A^(i)(n + 1)a;] - 
[A^(i)(n + m - l)k + . . . + Ap(i)(n)fc] 
= Ap(i){n + m)k - Ap(i)(n)fc 



= k 

= k 
> 



k{n + m)k-i - k{n)k-i 
(n + m)! 
_{n + m-k + iy. {n-k + 1)1 



{n-k + l) 



n + m 



n + m — k + 1 



... X 



n + 1 
n-k + 2 



because k > 1 implies every factor in the long product is > 1 . This implies monotonicity. Therefore 
equation ([27]) is false for all Ca, implying equations ([25]) is false, as was to be proved. 

□ 
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